INSTRUCTIONS:
##). Do one set at a time.KEYBOARD SHORTCUTS:
Alt+- for Windows and Option+- for MacCtrl+Alt+I for Windows and Command+Option+I for MacCtrl+Enter for Windows and Command+Enter for MacCtrl+Shift+Enter for Windows and Command+Shift+Enter for MacCtrl+Shift+M for Windows and Command+Shift+M for Maccontains() the term “delay” and the origin column. Filter this data to show all the flights that took off in the morning (before 12:00) from JFK in December. Make sure to use pipes between the select and filter command. Refer to the shortcut for inserting a pipe (see above).library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.3.1
## v tibble 2.0.1 v dplyr 0.8.0.1
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(nycflights13)
str(flights)
## Classes 'tbl_df', 'tbl' and 'data.frame': 336776 obs. of 19 variables:
## $ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ day : int 1 1 1 1 1 1 1 1 1 1 ...
## $ dep_time : int 517 533 542 544 554 554 555 557 557 558 ...
## $ sched_dep_time: int 515 529 540 545 600 558 600 600 600 600 ...
## $ dep_delay : num 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
## $ arr_time : int 830 850 923 1004 812 740 913 709 838 753 ...
## $ sched_arr_time: int 819 830 850 1022 837 728 854 723 846 745 ...
## $ arr_delay : num 11 20 33 -18 -25 12 19 -14 -8 8 ...
## $ carrier : chr "UA" "UA" "AA" "B6" ...
## $ flight : int 1545 1714 1141 725 461 1696 507 5708 79 301 ...
## $ tailnum : chr "N14228" "N24211" "N619AA" "N804JB" ...
## $ origin : chr "EWR" "LGA" "JFK" "JFK" ...
## $ dest : chr "IAH" "IAH" "MIA" "BQN" ...
## $ air_time : num 227 227 160 183 116 150 158 53 140 138 ...
## $ distance : num 1400 1416 1089 1576 762 ...
## $ hour : num 5 5 5 5 6 5 6 6 6 6 ...
## $ minute : num 15 29 40 45 0 58 0 0 0 0 ...
## $ time_hour : POSIXct, format: "2013-01-01 05:00:00" "2013-01-01 05:00:00" ...
flights %>%
select(c(1, 2, 3, "origin", contains("dep"))) %>%
filter(month == 12 & origin == "JFK" & dep_time < 1200)
top_n() to print the most delayed departures from NYC in 2013. Read the documentation for top_n() on tidyverse if you are confused.flights %>%
top_n(., n = 5, wt = dep_delay)
#five longest delays
#dont use == because we are assigning value to parameter
#not declaring math equality for sth evaluated
flights %>%
filter(month == 6 & day > 15)
#needs a ==, had an =
#we're not setting a param value here, it's an evaluation criteria
arr_delay) using the dense_rank() helper function.flights %>%
filter(dense_rank(desc(arr_delay)) %in% 10:40)
# %in% is a matching operator, see notes
#ties are eliminated b/c dense_rank, ties broken by data order
flights %>%
mutate(takeoff = if_else(dep_time < 1200, "Flight is AM", "Flight is PM"))
transmute() instead of mutate() to do the same. What is the difference between the two?flights %>%
transmute(takeoff = if_else(dep_time < 1200, "Flight is AM", "Flight is PM"))
group_by(), summarise() and other functions you have learnt previously.group_by() with mutate() to create a new variable called comparativeDelay which is the difference between departure delay and the average delay in each origin airport for every hour in 2013 (checkout the time_hour variable in the flights data). Store the result in a variable called comparativeDelays.comparativeDelays tibble by carriers to print the top 10 airlines with the greatest average comparative delays.group_by() with filter to print the 5 most delayed flights from each origin. Your printed tibble should have 15 rows.